Visualizing spatial data in R can be a challenging task. It is made a good deal easier by the data structures and plot methods of sp, RgoogleMaps, etc. Using those methods, one can plot the basic geographic information of (for instance) a shape file containing polygons for areal data or points for point referenced data.
Here we have new methods for the visualization of spatial data in R using the layered grammar of graphics implementation of ggplot2 in conjunction with the contextual information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps.
Areal data is data which corresponds to geographical extents with polygonal boundaries. A typical example is the number of residents per zip code. (See Fig. 1 left) When plotting zip codes it is helpful to also see major roads and other landmarks which form the boundaries of areal units.
Since we can’t easily contextualize a scatterplot of points without any background information at all, it is common to add points as an overlay of some areal data—whatever areal data is available. (See Fig. 1 right)
This plot leaves out many practical details, e.g., are most of the events to the east or west of landmark x? Are they clustered around more well-to-do parts of town, or do they tend to occur in disadvantaged areas? Questions like these can’t really be answered using these kinds of graphics because we don’t think in terms of small scale areal boundaries (e.g. zip codes or census tracts).
With a little effort better plots can be made, and tools such as maps, maptools, sp, or RgoogleMaps make the process much easier; in fact, RgoogleMaps was the inspiration for ggmap.
Mapmaking in R has made progress, e.g., the excellent interactive GUI-driven DeducerSpatial package based on Bing Maps. ggmap takes another step in this direction by situating the contextual information of various kinds of static maps in the ggplot2 plotting framework. The result is an easy, consistent way of specifying plots which are readily interpretable by both expert and audience and safeguarded from graphical inconsistencies by the layered grammar of graphics framework. The result is a spatial plot resembling Fig. 2.
library(ggmap)
## Loading required package: ggplot2
murder <- subset(crime, offense == "murder")
# qmplot(lon, lat, data = murder, colour = I('red'), size = I(3), darken = .3)
By definition, the layered grammar demands that every plot consist of five components :
Since ggplot2 is an implementation of the layered grammar of graphics, every plot made with ggplot2 has each of the above elements. Consequently, ggmap plots also have these elements, but certain elements are fixed to map components : the x aesthetic is fixed to longitude, the y aesthetic is fixed to latitude, and the coordinate system is fixed to the Mercator projection.
The major theoretical advantage of using the layered grammar in plotting maps is that aesthetic scales are kept consistent. In the typical situation where the map covers the extent of the data, in ggmap the latitude and longitude scales key off the map (by default) and one scale is used for those axes. The same is true of colors, fills, alpha blendings, and other aesthetics which are built on top of the map when other layers are presented—each is allotted one scale which is kept consistent across each layer of the plot. This aspect of the grammar is particularly important for faceted plots in order to make a proper comparison across several plots.
Since the graphics are done in ggplot2 the user can draw from the full range of ggplot2’s capabilities to layer elegant visual content—geoms, stats, scales, etc.—using the usual ggplot2 coding conventions. This was already seen briefly in Fig. 2 where the arguments of qmplot are identical to that of ggplot2’s qplot.
In ggmap, downloading a map as an image and formatting the image for plotting is done with the get_map function. More specifically, get_map is a wrapper function for the underlying functions get_googlemap, get_openstreetmap, get_stamenmap, and get_cloudmademap which accepts a wide array of arguments and returns a classed raster object for plotting with ggmap.
As the most important characteristic of any map is location, the most important argument of get_map is the location argument. Ideally, location is a longitude/latitude pair specifying the center of the map and accompanied by a zoom argument, an integer from 3 to 20 specifying how large the spatial extent should be around the center, with 3 being the continent level and 20 being roughly the single building level. location is defaulted to downtown Houston, Texas, and zoom to 10, roughly a city-scale.
While longitude/latitude pairs are ideal for specifying a location, they are somewhat inconvenient on a practical level. For this reason, location also accepts a character string. The string, whether containing an address, zip code, or proper name, is then passed to the geocode function which then determines the appropriate longitude/latitude coordinate for the center. In other words, there is no need to know the exact longitude/latitude coordinates of the center of the map—get_map can determine them from more colloquial (“lazy”) specifications so that they can be specified very loosely. For example, since
geocode("the white house")
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=the+white+house&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## lon lat
## 1 -77.03653 38.89768
works, “the white house” is a viable location argument. More details on geocode and other utility functions are discussed at the end of this article.
In lieu of a center/zoom specification, some users find a bounding box specification more convenient. To accommodate this form of specification, location also accepts numeric vectors of length four following the left/bottom/right/top convention. This option is not currently available for Google Maps.
While each map source has its own web application programming interface (API), specification of location/zoom in get_map works for each by computing the appropriate parameters (if necessary) and passing them to each of the API specific get_* functions. To ensure that the resulting maps are the same across the various sources for the same location/zoom specification, get_map first grabs the appropriate Google Map, determines its bounding box, and then downloads the other map as needed. In the case of Stamen Maps and CloudMade Maps, this involves a stitching process of combining several tiles (small map images) and then cropping the result to the appropriate bounding box. The result is a single, consistent specification syntax across the four map sources as seen for Google Maps and OpenStreetMap
baylor <- "baylor university"
qmap(baylor, zoom = 14)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=baylor+university&zoom=14&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=baylor+university&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
qmap(baylor, zoom = 14, source = "osm")
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=baylor+university&zoom=14&size=%20640x640&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=baylor+university&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
Before moving into the source and maptype arguments, it is important to note that the underlying API specific get_* functions for which get_map is a wrapper provide more extensive mechanisms for downloading from their respective sources. For example, get_googlemap can access almost the full range of the Google Static Maps API.
source and maptype arguments of get_mapThe most attractive aspect of using different map sources (Google Maps, OpenStreetMap, Stamen Maps, and CloudMade Maps) is the different map styles provided by the producer. These are specified
set.seed(500)
df <- round(data.frame(
x = jitter(rep(-95.36, 50), amount = .3),
y = jitter(rep( 29.76, 50), amount = .3)
), digits = 2)
map <- get_googlemap('houston', markers = df, path = df, scale = 2)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=houston&zoom=10&size=%20640x640&scale=%202&maptype=terrain&markers=29.6,-95.16%7c30.05,-95.22%7c29.74,-95.07%7c29.5,-95.38%7c29.58,-95.17%7c29.85,-95.54%7c29.51,-95.35%7c29.95,-95.1%7c30.04,-95.16%7c29.89,-95.23%7c29.48,-95.49%7c29.92,-95.12%7c29.97,-95.2%7c29.69,-95.56%7c30.02,-95.22%7c29.48,-95.51%7c29.69,-95.35%7c29.6,-95.29%7c29.56,-95.3%7c29.58,-95.51%7c29.66,-95.61%7c29.51,-95.17%7c29.49,-95.64%7c29.47,-95.34%7c29.98,-95.5%7c29.57,-95.17%7c29.65,-95.24%7c29.48,-95.12%7c29.54,-95.64%7c29.6,-95.58%7c29.67,-95.62%7c30.04,-95.64%7c30.04,-95.29%7c29.61,-95.6%7c29.75,-95.43%7c29.99,-95.15%7c29.94,-95.16%7c29.87,-95.64%7c29.8,-95.1%7c29.86,-95.63%7c29.82,-95.54%7c29.94,-95.41%7c29.66,-95.16%7c29.75,-95.37%7c29.68,-95.15%7c30.05,-95.31%7c29.54,-95.31%7c29.48,-95.38%7c29.64,-95.52%7c29.51,-95.52&path=29.6,-95.16%7c30.05,-95.22%7c29.74,-95.07%7c29.5,-95.38%7c29.58,-95.17%7c29.85,-95.54%7c29.51,-95.35%7c29.95,-95.1%7c30.04,-95.16%7c29.89,-95.23%7c29.48,-95.49%7c29.92,-95.12%7c29.97,-95.2%7c29.69,-95.56%7c30.02,-95.22%7c29.48,-95.51%7c29.69,-95.35%7c29.6,-95.29%7c29.56,-95.3%7c29.58,-95.51%7c29.66,-95.61%7c29.51,-95.17%7c29.49,-95.64%7c29.47,-95.34%7c29.98,-95.5%7c29.57,-95.17%7c29.65,-95.24%7c29.48,-95.12%7c29.54,-95.64%7c29.6,-95.58%7c29.67,-95.62%7c30.04,-95.64%7c30.04,-95.29%7c29.61,-95.6%7c29.75,-95.43%7c29.99,-95.15%7c29.94,-95.16%7c29.87,-95.64%7c29.8,-95.1%7c29.86,-95.63%7c29.82,-95.54%7c29.94,-95.41%7c29.66,-95.16%7c29.75,-95.37%7c29.68,-95.15%7c30.05,-95.31%7c29.54,-95.31%7c29.48,-95.38%7c29.64,-95.52%7c29.51,-95.52&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=houston&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
ggmap(map, extent = 'device')
with the maptype argument of get_map and must agree with the source argument. Some styles emphasize large roadways, others bodies of water, and still others political boundaries. Some are better for plotting in a black-and-white medium; others are simply nice to look at. This section gives a run down of the various map styles available in ggmap.
Google provides four different familiar types—terrain (default), satellite, roadmap, and hybrid. OpenStreetMap, on the other hand, only provides the default style.
Style is where Stamen Maps and CloudMade Maps really shine. Stamen Maps has three available tile sets—terrain, watercolor, and toner.
qmap(baylor, zoom = 14, source = "stamen", maptype = "watercolor")
qmap(baylor, zoom = 14, source = "stamen", maptype = "toner")
Stamen’s terrain tile set is quite similar to Google’s, but obviously the watercolor and toner tile sets are substantially different than any of the four Google tile sets. The latter, for example, is ideal for black-and-white plotting.
CloudMade Maps takes the tile styling even further by allowing the user to either
ggmap.ggmap, through get_map (or get_cloudmademap) allows for both options. This is a unique feature of CloudMade Maps which really boosts their applicability and expands the possibilities with ggmap. The one minor drawback to using CloudMade Maps is that the user must register with CloudMade to obtain an API key and then pass the API key into get_map with the api_key argument. API keys are free of charge and can be obtained in a matter of minutes.
Both Stamen Maps and CloudMade Maps are built using OpenStreetMap data. These data are contributed by an open community of online users in much the same way Wikipedia is—both are free, both are user-contributed, and both are easily edited. Moreover, OpenStreetMap has data not only on roadways and bodies of water but also individual buildings, fountains, stop signs and other apparent minutiae. The drawback is that (like Google Maps) not all locations are mapped with the same degree of precision, and imperfections may be observed in small-scale out of the way features.
ggmap functionOnce get_map has grabbed the map of interest, ggmap is ready to plot it. The result of get_map is a specially classed raster object (a matrix of colors as hexadecimal character strings).
paris <- get_map(location = "paris")
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=paris&zoom=10&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=paris&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
str(paris)
## chr [1:1280, 1:1280] "#C9DEA9" "#C9DEA9" "#C7DCA7" "#C7DCA7" ...
## - attr(*, "class")= chr [1:2] "ggmap" "raster"
## - attr(*, "bb")='data.frame': 1 obs. of 4 variables:
## ..$ ll.lat: num 48.6
## ..$ ll.lon: num 1.91
## ..$ ur.lat: num 49.1
## ..$ ur.lon: num 2.79
qmap(baylor, zoom = 14, maptype = 53428, api_key = api_key, source = "cloudmade")
qmap("houston", zoom = 10, maptype = 58916, api_key = api_key, source = "cloudmade")
The purpose of ggmap is to take the map from the raster object to the screen, and it fulfills this purpose by creating a ggplot object which, when printed, draws the desired map in the graphics device.
While ggmap requires a ggmap object, it accepts a handful of other arguments as well— extent, base_layer, maprange, legend, padding, and darken. With none of these additional arguments, ggmap effectively returns the following ggplot object
ggplot(aes(x = lon, y = lat), data = fourCorners) +
geom_blank() + coord_map("mercator") +
annotation_raster(ggmap, xmin, xmax, ymin, ymax)
where fourCorners is the data frame resulting from applying expand.grid to the longitude and latitude ranges specified in the bb attribute of the ggmap object. Thus, the default base layer of the ggplot2 object created by ggmap is ggplot(aes(x = lon,y = lat),data = fourCorners), and the default x and y aesthetic scales are calculated based on the longitude and latitude ranges of the map.
The extent argument dictates how much of the graphics device is covered by the map. It accepts three possible strings: “normal”, “panel” , and “device” . “normal” situates the map with the usual axis padding provided by ggplot2 and, consequently, one can see the panel behind it. “panel” eliminates this, setting the limits of the plot panel to be the longitude and latitude extents of the map with scale_[x,y]_continuous(expand = c(0,0)). “device” takes this to the extreme by eliminating the axes themselves with the new exported theme_nothing.
base_layer is a call which substitutes the default base layer to the user’s specification. Thus, in the above code the user can change ggplot(aes(x = lon,y = lat),data = fourCorners) to a different call. This is essential for faceting plots since the referent of ggplot2 functions facet_wrap and facet_grid is the base layer. Since changing the base layer changes the base scales and therefore limits of the plot, it is possible that when the base layer is changed only part of the map is visible. Setting the maprange argument to TRUE (it defaults to FALSE) ensures that the map determines the x and y axis limits (longitude and latitude) via the bb attribute of the ggmap object itself, and not the base_layer argument.
ggmap(paris, extent = "normal")
The legend-related arguments of ggmap are legend and padding, and they are only applicable when extent = "device". The legend argument determines where a legend should be drawn on the map if one should be drawn. Its options are “left”, “right” (default), “bottom”, “top”, “topleft”, “bottomleft”, “topright”, “bottomright” and “none”. The first four draw the legend according to ggplot2’s normal specifications (without any axes); the next four draw the legend on top of the map itself similar to ArcGIS; and the last eliminates the legend altogether. padding governs how far from the corner the legend should be drawn.
The darken argument, a suggestion by Jean-Olivier Irisson, tints the map image. The default, c(0,"black"), indicates a fully translucent black layer, i.e. no tint at all. Generally, the first argument corresponds to an alpha blending (0 = invisible, 1 = opaque) and the second argument the color of the tint. If only a number is supplied to the darken argument ggmap assumes a black tint. The tint itself is made by adding a geom_rect layer on top of the map.
Since ggmap returns a ggplot object, the product of ggmap can itself act as a base layer in the ggplot2 framework. This is an incredibly important realization which allows for the full range of ggplot2 capabilities. We now illustrate many of the ways in which this can be done effectively through a case study of violent crime in downtown Houston, Texas.
Crime data were compiled from the Houston Police Department’s website over the period of January 2010–August 2010. The data were lightly cleaned and aggregated using plyr and geocoded using Google Maps (to the center of the block, e.g., 6150 Main St.); the full data set is available in ggmap as the data set crime
str(crime)
## 'data.frame': 86314 obs. of 17 variables:
## $ time : POSIXt, format: "2010-01-01 01:00:00" "2010-01-01 01:00:00" ...
## $ date : chr "1/1/2010" "1/1/2010" "1/1/2010" "1/1/2010" ...
## $ hour : int 0 0 0 0 0 0 0 0 0 0 ...
## $ premise : chr "18A" "13R" "20R" "20R" ...
## $ offense : Factor w/ 7 levels "aggravated assault",..: 4 6 1 1 1 3 3 3 3 3 ...
## $ beat : chr "15E30" "13D10" "16E20" "2A30" ...
## $ block : chr "9600-9699" "4700-4799" "5000-5099" "1000-1099" ...
## $ street : chr "marlive" "telephone" "wickview" "ashland" ...
## $ type : chr "ln" "rd" "ln" "st" ...
## $ suffix : chr "-" "-" "-" "-" ...
## $ number : int 1 1 1 1 1 1 1 1 1 1 ...
## $ month : Ord.factor w/ 8 levels "january"<"february"<..: 1 1 1 1 1 1 1 1 1 1 ...
## $ day : Ord.factor w/ 7 levels "monday"<"tuesday"<..: 5 5 5 5 5 5 5 5 5 5 ...
## $ location: chr "apartment parking lot" "road / street / sidewalk" "residence / house" "residence / house" ...
## $ address : chr "9650 marlive ln" "4750 telephone rd" "5050 wickview ln" "1050 ashland st" ...
## $ lon : num -95.4 -95.3 -95.5 -95.4 -95.4 ...
## $ lat : num 29.7 29.7 29.6 29.8 29.7 ...
Since we are only interested in violent crimes which take place downtown, we restrict the data set to those qualifiers. To determine a bounding box, we first use gglocator, a ggplot2 analogue of base’s locator function exported from ggmap. gglocator works not only for ggmap plots, but ggplot2 graphics in general.
# find a reasonable spatial extent
qmap('houston', zoom = 13)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=houston&zoom=13&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=houston&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
gglocator(2)
## lon lat
## 1 NA NA
## 2 NA NA
# only violent crimes
violent_crimes <- subset(crime, offense != "auto theft" & offense != "theft" & offense != "burglary")
# order violent crimes
violent_crimes$offense <-
factor(violent_crimes$offense,
levels = c("robbery", "aggravated assault", "rape", "murder"))
# restrict to downtown
violent_crimes <- subset(violent_crimes,
-95.39681 <= lon & lon <= -95.34188 &
29.73631 <= lat & lat <= 29.78400)
The analysis performed only concerns data on the violent crimes of aggravated assault, robbery, rape and murder. Note that while some effort was made to ensure the quality of the data, the data were only leisurely cleaned and the data set may still contain errors.
The first step we might want to take is to look at where the individual crimes took place. Modulo some simple ggplot2 styling changes (primarily in the fonts and key-styles of the legends via ggplot2’s guide function).
One of the problems with the bubble chart is overplotting and point size—we can’t really get a feel for what crimes are taking place and where. One way around this is to bin the points and drop the bins which don’t have any samples in them.
The binned plot is the first time we really begin to see the power of having the maps in the ggplot2 framework. While it is actually not a very good plot, it illustrates the practical advantage of the ggplot2 framework with the contextual information of the map—the process of splitting the data frame violent_crimes into chunks based on the offense variable, binning the points of each, and aggregating back into one data set to plot is all done entirely behind the scenes by ggplot2.
What about violent crimes in general? If we neglect the type of offense, we can get a good idea of the spatial distribution of violent crimes by using a contour plot. Since the map image itself is based on ggplot2’s annotation_raster, which doesn’t have a fill aesthetic, we can access the fill aesthetic to make a filled contour plot.
theme_set(theme_bw(16))
HoustonMap <- qmap("houston", zoom = 14, color = "bw", legend = "topleft")
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=houston&zoom=14&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=houston&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
HoustonMap +
geom_point(aes(x = lon, y = lat, colour = offense, size = offense),
data = violent_crimes)
## Warning: Removed 11 rows containing missing values (geom_point).
HoustonMap +
stat_bin2d(
aes(x = lon, y = lat, colour = offense, fill = offense),
size = .5, bins = 30, alpha = 1/2,
data = violent_crimes
)
What about violent crimes in general? If we neglect the type of offense, we can get a good idea of the spatial distribution of violent crimes by using a contour plot. Since the map image itself is based on ggplot2’s annotation_raster, which doesn’t have a fill aesthetic, we can access the fill aesthetic to make a filled contour plot.
houston <- get_map("houston", zoom = 14)
HoustonMap <- ggmap("houston", extent = "device", legend = "topleft")
HoustonMap +
stat_density2d(
aes(x = lon, y = lat, fill = ..level.., alpha = ..level..),
size = 2, bins = 4, data = violent_crimes,
geom = "polygon"
)
overlay <- stat_density2d(
aes(x = lon, y = lat, fill = ..level.., alpha = ..level..),
bins = 4, geom = "polygon",
data = violent_crimes
)
HoustonMap + overlay + inset(
grob = ggplotGrob(ggplot() + overlay + theme_inset()),
xmin = -95.35836, xmax = Inf, ymin = -Inf, ymax = 29.75
)
These kinds of overlays can be incredibly effective; however, their ability to communicate information can be hindered by the fact that the map overlay can be visually confused with the map itself. This is particularly common when using colored maps. To get around this problem the inset function can be used to insert map insets containing the overlay on a white background with major and minor axes lines for visual guides made possible by the exported function theme_inset.
The image indicates that there are three main hotspots of activity. Each of these three corresponds to locations commonly held by Houstonians to be particularly dangerous locations. From east to west, the hotspots are caused by:
In addition to single plots, the base_layer argument to ggmap or qmap allows for faceted plots. This is particularly useful for spatiotemporal data with discrete temporal components (day, month, season, year, etc.).
This last plot displays one of the known issues with contour plots in ggplot2—a “clipping” or “tearing” of the contours. Aside from that fact (which will likely be fixed in subsequent ggplot2 versions), we can see that in fact most violent crimes happen on Monday, with a distant second being Friday. Friday’s pattern is easily recognizable—a small elevated dot in the downtown bar district and an expanded region to the southwest in the district known as midtown, which has an active nightlife. Monday’s pattern is not as easily explainable.
houston <- get_map(location = "houston", zoom = 14, color = "bw",
source = "osm")
HoustonMap <- ggmap(houston, base_layer = ggplot(aes(x = lon, y = lat),
data = violent_crimes))
HoustonMap +
stat_density2d(aes(x = lon, y = lat, fill = ..level.., alpha = ..level..),
bins = 5, geom = "polygon",
data = violent_crimes) +
scale_fill_gradient(low = "black", high = "red") +
facet_wrap(~ day)